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Abstract 

We jointly model longitudinal values of a psychometric test and 
\ diagnosis of dementia. The model is based on a continuous-time la- 

OO ' tent process representing cognitive ability. The link between the latent 

process and the observations is modeled in two phases. Intermediate 
\ variables are noisy observations of the latent process; scores of the 

O | psychometric test and diagnosis of dementia are obtained by catego- 

ry ■ rizing these intermediate variables. We propose maximum likelihood 

, inference for this model and we propose algorithms for performing this 

task. We estimated the parameters of such a model using the data of 
the five-year follow-up of the PAQUID study. In particularThis anal- 
ysis yielded interesting results about the effect of educational level on 
both latent cognitive ability and specific performance in the mini men- 
tal test examination. The predictive ability of the model is illustrated 
by predicting diagnosis of dementia at the eight-year follow-up of the 
PAQUID study bsed on the information of the first five years. 



Key words: latent process, Brownian motion, joint model, ordinal data, mul- 
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1 Introduction 



Alzheimer's disease is clinically characterized by a progressive decline of cog- 
nitive abilities and is the main cause of dementia. This feature has two 
important consequences for the modeling. First it is only an idealization to 
consider that the disease starts at a particular moment. The diagnosis is 
made at the time of examination by a neurologist but this does not mean 
that the disease started at this precise moment, nor even at any precise mo- 
ment before examination. The second consequence is that psychometric tests 
which measure cognitive abilities can provide important information regard- 
ing the progression of a pathological process which may lead to a diagnosis 
of Alzheimer's disease or dementia. It is interesting to devise models which 
link the two types of information (diagnosis of dementia and psychometric 
tests) with three main objectives: to better understand this link, to increase 
the power for detecting risk factors, to predict dementia using previous ob- 
servations of scores of psychometric tests. 

The problem can be tackled through joint modeling of an event (onset of 
dementia) and a longitudinal marker (scores of a psychometric test). Joint 
modeling of CD4 cell counts and onset of AIDS or death has been proposed 
by Faucett and Thomas (1996) and Wulfsohn and Tsiatis (1997). Concerning 
dementia a model has been proposed by Jacqmin-Gadda, Commenges and 
Dartigues (2005), with the specific aim of estimating a change-point in the 
regime of cognitive decline. Approaches based on a stochastic process frame- 
work are particularly well suited to grasp the dynamics of diseases. Hender- 
son, Diggle and Dobson (2000) proposed a model in which a latent process 
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acts as a time-dependent variable in a proportional hazards model. Other ap- 
proaches of joint modeling represent the event as the crossing of a barrier by 
the latent process (Whitmore, Crowder and Lawless, 1998; Lee, DeGruttola 
and Schoenfeld, 2000). This approach was developed by Hashemi, Jacqmin- 
Gadda and Commenges (2003) and applied to joint modeling of dementia 
and a psychometric test: in this model the latent process was interpreted as 
representing cognitive ability. The present paper proposes an evolution of 
this work with important differences which make the model much more flex- 
ible, and thus more usable; in particular, for technical reasons, the Hashemi- 
Jacqmin-Gadda- Commenges model was restricted to linear time-trend for 
the latent process. 

We propose a new model which allows for the diagnosis of dementia and 
scores on a psychometric test to be analyzed together. The model looks par- 
ticularly non-standard for dementia because we do not model onset of demen- 
tia but diagnosis of dementia at the time of visit. This is in fact more realistic 
(even though interval-censoring was treated in the Hashemi- Jacqmin-Gadda- 
Commenges model) because onset of dementia is an abstraction; cognitive 
decline is in fact most often progressive. Thus our basic model is that a 
neurologist makes a diagnosis of dementia if the subject has a latent process 
below a certain threshold at the time of visit. As for scores of the psycho- 
metric test, we consider a grid of threshold values c m , and the subject has 
score m if his latent process falls between c m and c m+ i at time of visit. This 
is a refined model compared with previous works treating ordinal scores as 
continuous. With this approach, both diagnosis of dementia and score of the 
psychometric test are categorized observations of the latent process. This is 
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reminiscent of probit models for ordinal data (McCullagh and Nelder, 1989; 
Chib and Greenberg, 1998), but here the underlying latent process allows to 
capture the dynamics of the phenomenon under study. Our model is in fact 
slightly more complicated than the above description, as will be described 
later. 

In section 2 we present a general form of the model which could be ap- 
plied to contexts other than cerebral ageing. In section 3 the identifiability 
is studied and the likelihood is derived. In section 4 we come to the specific 
model used for dementia and the Mini Mental Score Examination: we be- 
gin by describing the PAQUID study, a large cohort study on ageing which 
provides the data we used; then we describe the model, present a small sim- 
ulation and give results, particularly on the predictive ability of the model. 
We end with a short conclusion. 

2 Model and observations 
2.1 Outline of the model 

We propose a general model for multidimensional longitudinal data based on 
a latent process. The observation of type k for subject i at time Uj will be 
denoted Yjj (in our application we will use observations of two types: k — 1: 
diagnosis of dementia, k = 2: a psychometric test). Similarly as in Dunson 
(2003) we propose a hierarchical structure where the observations Y£ are 
possibly coarsening transformations of latent variables 9^, and these latent 
variables are related to common latent elements. 

The common latent element in our model is a latent process Aj(£) which is 
defined in continuous time (in contrast with Dunson's model). In our appli- 
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cation it is natural to consider that subjects have a certain cognitive ability 
quantitatively represented by Aj(t) for any t, not only at measurement times. 
It is also made possible for this approach to treat unequally spaced observa- 
tion times which may be different from one subject to another. The model 
for the latent process, driven by a Brownian motion, yields a natural corre- 
lation structure for the intermediate latent variables 9fj, without introducing 
additional parameters which would have to be estimated. 

Another trait of our model is that it may be non-linear in the parameters. 
In the next section we present the model in the most general form that can 
be easily treated with our approach because it preserves the normality of 
the d\y Finally the model is a kind of multivariate probit model (Chib and 
Greenberg, 1998): it has a more direct interpretation than assuming that the 
9fj are related to the canonical parameters of a distribution in the exponen- 
tial family, and it is related to threshold models already used by Hashemi, 
Jacqmin-Gadda and Commenges (2003) in this application. Moreover it leads 
to simpler numerical integrals. 

Because of the central role of the latent process in our model, we will start 
by describing it, specifying afterwards how it can be observed. We consider 
that there might be other observations (for instance other psychometric tests) 
at other times; this would not affect our latent process which has an intrinsic 
meaning. 

2.2 Latent process 

For each subject i we introduce Aj = (Aj(t)) t > , a continuous-time stochastic 
latent process; in our application Aj(t) will represent the global cognitive 
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ability of subject % at time t. This latent process is modeled as a function of 
explanatory variables as: 

Ai(t) = f(P,Xi(t)) + F{ llZi {t)) ai + Wi(t), (1) 

where Wi = (Wi(t)) t > Q is a standard Brownian motion. The q- vector of 
random effects a, has a multivariate normal distribution: a; ~ J\f(0,A); ai 
and Wi are independent and the sets (a*, i — 1, . . . , n) and (Wi, i — 1, . . . , n) 
are sets of independent random vectors and processes; the functions /(., .): 
RP x R l — > R and F(.,.): R p x R l ^ R q are differentiable and possibly 
non-linear; (3 and 7 are vectors of coefficients (some which may be inter- 
preted as regression coefficients, others which are used to parameterize the 
non-linearity) and Xi(t) and Zi(t) are vectors of time dependent covariates 
including t itself. 

A linear model for the latent process Aj(t) = Xi(t) T (3 + Zi(t) T at + Wi(t), 
is a particular case of model ([T]). Note that in a linear model there is no 
parameter 7. 

In the application we might consider the non-linear model: Aj(i) = (3\ + 
/?2^i2 + (#3 + (34X i2 )xii(t) /35 + aii + where xn(t) = t is time itself, x i2 

represents educational level. This model is non-linear in time, but also in 
the parameters; parametrizing the power of time (^5) offers more flexibility 
in modeling the effect of time. 

2.3 Observation equations. 

We consider that the values of "tests" at different time points are indirect 
observations of the latent process; in our application the "tests" include both 
psychometric tests and diagnosis of dementia. We model the link between 
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the latent process and the tests in two phases: first we introduce, for subject 
i, intermediate random variables 9 k j which can be seen as potential mea- 
surements for each test k = 1, ... ,K of Aj(t^); secondly we represent the 
values of the tests as functions of these intermediate variables. The reason 
for differentiating these two phases is that the 9fj are linear in Aj(£y) and 
have normal distributions while the tests functions may be non-linear and 
discontinuous. The times £y will be treated as deterministic. They might be 
random but under the condition that the mechanism leading to incomplete 
data is ignorable, a condition under which the likelihood treating these times 
as fixed leads to the same inference as the correct likelihood. We make the 
same assumption for possibly missing data. 

2.3.1 Definition of 0&. 

The intermediate variables for subject % and for test k are defined as: 

0* = Ai(%) + g k {{3\ xUUj)) + G fc ( 7 fc , z?fo))d? + e£, (2) 

for j — 1, ... , rii, where g k (., .) and G k (., .) are analogous to /(., .) and F(., .) 
in the definition of the latent process but are specific to the k th test; d k is 
a r^-random vector with normal distribution: d k ~ Af(0,D k ); the measure- 
ment errors e k j are identically independently distributed (i.i.d) variables with 
normal distributions: e*j- ~ J\f(0,a^ k ), for all j. The triple (A^ty), d^, e^-) is 
a set of independent variables for any choice of i, j, k. 

A linear model for the intermediate variables 9 k j = Aj(ty) + x k {tij) T (3 k + 
z k (tij) T d k + e k , is a particular case of model (J2j) . 
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2.3.2 Link between 9f- and the data: the tests functions 

For subject i, we denote Yh the random variable representing the observation 
of the k th test on the occasion of the j th visit at time tij. We will consider 
the cases of ordinal (including binary) longitudinal data. We consider a test 
k for which ordered values are possible (m £ [0, — 1]). Observation of 
Yjj = m provides the information that 8^ lies between two thresholds, that is, 
Y/j — m if and only if & m < d\- < c^ +1 , with c = — oo and cu h = +oo. The 
test function (which is the function of Oh that equals Yu) is in this case a step 
function. The cut-off points & m are not known and must be parameterized or 
estimated directly according to the number of possible values M\~. Generally 
we shall represent & m as a function of parameters r] k , the dimension of which 
may be less than — 1 in order to obtain a more parsimonious model: 
c m = T k (m,r] k ),Wm G [1, Mk — 1], where r fc (.; rj k ) is a monotone function. 

Binary data are simply a special case of ordinal data for which we only 
need one cut-off point, t/q for instance. For a binary test, Y-j = l^ e k >rj ky. 

3 Likelihood Inference 

For establishing the likelihood we will first study the distribution of the 
intermediate variables. Then we establish the likelihood for the case where 
the tests are ordinal variables as in our application. 

3.1 Joint distribution of the intermediate variables 

We shall study the distribution of the Krii vector Oj = (6^-; k — 1 . . . , K; j — 
1, . . . ,Tii). It is to be noted that in equations and ([2]) linearity in the 
random effects is assumed: this requirement is important to remain in a 

8 



Gaussian framework; that is to say 0, ~ A/"(/ij,Ej). Thus computing the 
distribution of comes down to computing its mean vector //, and variance 
covariance matrix Ej. The expectation can easily be computed since we have: 

E(flS-) = /(Aiiy+ffW*)) 



The variance of Oj is the sum of the variance coming from the latent 
process E^a, the variance of the test specific random effects E id and the 
variance of the noise term Ej £ : 



Ej — EjA+Ej^+Ej, 
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where E° A = F ; A Fi + Tj, and Tj is the covariance matrix associated with 
the Brownian motion: 
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and Fi = (F(fi, Zi(ta)), . . . , F(fi, Zi(t irH ))), a q x ni-matrix, and where Ef d = 
G^ T D fe G* with Gj< = (G fe ( 7 fc , ^(tii)), . . . , G fe ( 7 fe , ^ n J)) , a r fc x n, ma- 
trix. 

3.2 Identifiability 

Clearly there must be some constraints on the parameters to ensure identifi- 
ablity. A thorough analysis is out of the scope of this paper but we give some 
insight into it. We can distinguish three sets of parameters: (3 = ((3,(3 k , k = 
1,...,K), 7 = (^A n k ,D k ,a 2 k ,k = 1,...,K) and rj = ( V k ,k = 1 . . . , , K) 
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and the whole set of parameters is at = ((3, 7, rj). We consider the case of the 
linear model for the sake of simplicity; in the linear model there is no param- 
eter 7 nor 7 fe . Clearly in order that (3 and 7 be identifiable from observation 
of the they should be identifiable from the observation of 

Let us now look at sufficient conditions for this. In the linear model there 
is a matrix A such that E(O) = A(3. A necessary and sufficient condition 
for identifiability of j3 is r(A) = dim((3), where r(A) is the rank of A: 
this happens if and only if the columns of A are linearly independent. A 
necessary condition for that is Kj^rii > dim{(3). A sufficient condition of 
identifiability of (3 is: 

CI: (i) there is no collinearity of the explanatory variables ; (ii) there are 
no explanatory variable for one of the equations of the intermediate variable. 

Point (i) is common in all linear models. That CI is sufficient for identi- 
fiability of (3 can be seen from the structure of the A matrix. 

Similarly for the identifiability of 7 we consider the condition: 

C2: (i) There is no random effect for one of the equations of the inter- 
mediate variable; (ii) we do not have that all the matrices F^Fj are equal. 

For instance if there is no random effect for test k we have: var^(#f) = 
Fj AFi + Ti + a £ kl rii . If there was non-identifiability there would exist 7' 7^ 7 
such that vary (0^) = var^(^), which would entail: F^(A' - A)F,i = (cr' £k - 
a £ k)I ni . However the rank of the left-hand side is q while the rank of the 
right-hand side matrix is ?v So unless rii = q for all i, this equality holds 
only if A' = A and a' gk = a £ k. If rii = q for all i, we could solve the equation 
to find (A' — A) as a function of Fj leading to the additional requirement 
that FiFj be the same for all i. 
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As for the identifiability of the whole set of parameters from the observa- 
tion of the Y£ it is difficult to prove a sufficient condition. There is at least 
an obvious non-identifiability case that can be detected, and thus avoided. 
For fixed 7 the distribution of the depends only on the cf — Ep(9fj) for 
1 = 1,..., M k _i, k = 1, . . . , K. If the model for the cut-off points allows to 
find r]' k such that: c k (i]' k ) = c k (i] k )+A for I = 1, ... , M k -i, k = 1, . . . , K and 
if there is an intercept in the equation of the latent process, then the 
distribution of the Y^ specified by ct', where a' is defined by tj', @[ — Pi + A 
and the other parameters equal to those of a., is the same as that specified 
by a. To avoid this non-identifiability case we may for instance give a fixed 
value to one cut-off value or the intercept f3\, a condition we call "C3". 

In practice we recommend that conditions CI, C2 and C3 be applied, or 
analogous conditions since these are particular cases of constraints that may 
be put on the three levels of the model. 

3.3 Likelihood 

We will first establish the individual contribution to the likelihood £j(a). 
for any subject %. We denote by y k j the (realized) observation relative to the 
k th test on the occasion of the j th visit at time t^, a realization of Y k -. £; is 
the probability according to the model of the observed trajectory, that is: 

= P\Yn — Uiii ■ ■ ■ ? Y in . = y in ., . . . ,Y a — y n , . . . , Y in . = y in .] 

We will now define the sets over which integration will be required. Let 
C k j be the interval relative to observation y k - and intermediate variable 9 k j. 
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If we define Cj the orthant concerning subject i, Ci 



<g> Cfi, we 



j=i,k=i 



obtain for the entire path concerning subject i 



d(a) = P\Y* = y"j = l,...,m;k = l,...,K\ = P[0, G Q] 



As 0j ~ J\f(fj,i,Hi), we just need to integrate the multivariate normal 
probability density function ^(^E;) over the Cj sets: 



Missing values cause no problem because if value at test k at time Uj is 
missing, the integration set for this observation becomes ] — oo, +00 [, so 
this simply decreases the multiplicity of the integral by one. It is possible 
to include a truncation condition by writing a conditional likelihood. See 
the application section (4.3) for an illustration. Independence over subjects 
allows to obtain the likelihood of the sample as C(oc) = Yl^i A(o0- 

3.4 Maximisation algorithm 

The likelihood is difficult to compute since each Ci involves a multiple in- 
tegral, which has to be computed numerically (see Evans and Swartz, 2000, 
for a review). However, an advantage of our model is that the integrals that 
we have to compute are integrals of normal multivariate densities. Efficient 
techniques exist for this task: in particular the algorithms proposed by Genz 
(1992) allow us to compute such integrals up to a multiplicity of 20. The 
multiplicity of the integral for computing Ci is Krii. For instance in our 
application we have K = 2 and n-i = 4, which leads to a multiplicity of 8, a 
feasible problem with the Genz algorithm. 
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Maximum likelihood estimators can be obtained by using quasi-Newton 
algorithms. We have considered a Marquardt algorithm (Marquardt, 1963) 
and an algorithm used by Heddeker and Gibbons (1994) and Todem, Kim 
and Lesaffre (2007), in which the Hessian of the log-likelihood is replaced 
by the estimated variance matrix of the score. This algorithm has been 
further studied and called "Robust-variance scoring" (RVS) algorithm by 
Commenges et al. (2006). An advantage of the RVS algorithm is that it 
needs only first derivatives of the log-likelihood, and the standard errors are 
obtained from the estimated variance matrix of the score at the maximum. 
Our experience shows that the RVS algorithm is more than twice as fast as 
the Marquardt algorithm in our problem. 

4 Application 

4.1 The PAQUID study and the studied sample 

The proposed approach was applied to the joint modeling of diagnosis of de- 
mentia and a psychometric test, the Mini Mental State Examination (MMSE) 
(Folstein et al. 1975), using the data of the PAQUID cohort. 

The PAQUID program on cerebral aging is based on a large cohort ran- 
domly selected in a population of subjects aged 65 years or older, living at 
home in two administrative areas of southwest France (Gironde and Dor- 
dogne). Our analysis bears on the first eight years of the follow-up of this 
study. In addition to the initial visit, subjects were seen approximately after 
one, three, five and eight years in Gironde and after three, five and eight 
years in Dordogne; the successive visits are denoted by TO, Tl, T3, T5 and 
T8. At each visit the MMSE was measured and diagnosis of dementia was 
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made by neurologists, based on the DSMIII-R criteria (for details see Leten- 
neur et al., 1999). We will use the first five years to fit the model and the 
eight-year follow-up to assess the predictive ability of our model. 

Our sample was composed only of women who were not demented at 
the initial visit. It is safer to analyse men and women separately because 
the dynamics of ageing seems to be quite different between the genders (see 
Commenges et al., 2004). Because there are more women than men in the 
PAQUID sample we chose to focus on women. We introduced the condition 
of being non-demented at the initial visit because it is doubtful that the 
PAQUID sample is representative of the whole population (demented and 
non-demented): demented subjects are often institutionalized. The condi- 
tion of being non-demented at entrance must be taken into account in the 
likelihood (see section 4.3). At the initial visit there were five cases which, 
although not diagnosed as demented, obtained a MMSE score of zero (this 
can be seen on Figure 1): these subjects had cognitive impairment due to 
other causes than dementia (stroke, psychiatric illness); we have chosen to 
keep them in the sample. 

We thought that the evolution of cognitive ability may be strongly af- 
fected by dementia and it was not our aim to describe this evolution; in 
consequence, further observations of the MMSE after diagnosis of dementia 
were not taken into account. This artificial right-censoring is ignorable: the 
reason is that it is done on the basis of an observed variable included in the 
model and this can be proved using results of Commenges and Gegout-Petit 
(2005). 

Finally, our study sample was composed of 2131 women aged 65 years 
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or older and who were not demented at the initial visit. During the 5- 
year follow-up we had 5622 observations of the MMSE. We had also 5742 
assessments of the demented status; among them, 126 were diagnoses of 
dementia. 

4.2 The model applied to the PAQUID sample 
4.2.1 The explanatory variables 

The different components of the model we developed may depend on educa- 
tional level and a variable indicating whether the test was administered for 
the first time (to take into account a possible practice effect): educational 
level has been shown to be a risk factor of dementia (Letenneur et al., 1999) 
and a practice effect of the MMSE has been found (Jacqmin-Gadda et al., 
1997). Moreover, there has been debate about the necessity of correcting the 
MMSE for educational level in order to determine cognitive impairment, a 
prognostic factor of dementia. 

The most difficult problem is to define what time is in our model. Since 
we wish to relate cognitive decline to age it is natural to determine a time- 
scale for each subject closely related to age. We could consider that the time 
that is relevant for a subject is the time elapsed since her birth, that is, age. 
However, in this model we do not wish to model the evolution of cognitive 
ability from birth (we would have to develop a much more complicated model) 
but only the decline of cognitive ability from an age at which we think that 
this phenomenon may start for a non-negligible fraction of the population. 
We took as origin the age of 65 for the two following reasons: (i) we have 
observations from age 65, making it awkward to take a later origin, which 
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would lead to negative times: particularly in a non-stationary (due to the 
Brownian motion) and non-time-homogeneous (due to the non-linearity in t) 
model this would not make sense; (ii) we have tried earlier time origins but 
this yielded lower likelihoods. 

Educational level is represented by the binary variable that we will denote 
by Edj so that Ed« = 1 if subject i has obtained a primary school diploma 
and if not. Practice effect, denoted by Praj, is defined as: Praj(t) = 1 for 
t < Ui and Praj(t) = for t > tn. 

For clarity of interpretation we will describe the model directly in terms 
of t, Edj and Praj(i) rather than using the general notations. 

4.2.2 The latent process 

In this application of our model, the latent process represents cognitive abil- 
ity: diagnosis of dementia and MMSE will be considered as indirect mea- 
surements of it. The latent process is defined by equation (1) in which we 
specify /(.,.) as: 



7i — Phi that is, there is a vector of random effects ai of size q = 2 bearing on 
the intercept (3\ and the slope (3 3 . However the algorithm failed to converge 
when we tried to estimate the two variance parameters and the correlation 
coefficient of the two random effects, probably due to the presence of the 
Brownian motion. The algorithm converged if we assumed a diagonal vari- 



f(/3, Xi (t)) = (A + &Ed,) + (& + /3 4 Ed*)* 



As for the function F(., .) we tried: 




+ a2,it 71 . It was natural to assume 




16 



only one random effect obtained with the F{^f, Zj(t))a, = af, since this sim- 
pler model gave nearly the same result, we present this simpler model in the 
following. For this model the latent process is defined as: 

A*(t) = (A + /3 2 Ed,) + + /3 4 Edi)^ s + ai,* + IT (t). (3) 
4.2.3 Observation equations. 

In this application, we jointly model the diagnosis of dementia and the MMSE 
score, so that K = 2: the first "test" (k=l) is diagnosis of dementia and this 
is a binary variable; the second "test" (k=2) is the MMSE which has 31 
values. The specification of the equations for the intermediary variables is 
guided by interpretability and identifiability issues. 

We have introduced a random effect in the model of the intermediate 
variable 9jj for diagnosis (k = 1). In formula (TSJ) we took gK/3 1 , x[(iy)) = 
and GKj 1 , zj(tij)) = 1; there was one random effect d\ ~ iV(0, er^). This 
random effect makes it possible that subjects with a low latent process are 
not diagnosed demented; this may happen because some subjects have always 
had low cognitive ability not linked to a neurodegenerative process. We did 
not introduce additional error term, that is to say a 1 ^ =0, nor explanatory 
variables (thus satisfying condition CI in section 3.2). Thus the intermediate 
variable for dementia is: 

el^k^ + d]. (4) 

For relating this variable to the diagnosis of dementia (which means defining 
the "test function") we just need one cut-off value given by the parameter 
r] : Yjj = 1 if and only if Q\- < r] . Our notation here for the parameters r] 
differs slightly from the general case: we use r/o for dementia and T]i, r] 2 and 
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?73 for the MMSE, the meaning of which is explained below. 

As for the MMSE (k = 2) we took into account both the practice effect 
and the specific impact of educational level on MMSE. The practice effect is 
only located on the first visit (j = 1) and we introduced an interaction with 
educational level (meaning that the practice effect may not be the same for 
subjects with or without a primary school diploma). Thus in formula (|2J) 
we took gf(f3,Xi(tij)) = flfEdi + /^Pra^ty) + /3fEdj x Prai(t^). No specific 
random effect was introduced in the MMSE equation (condition C2), so 
G 2 ^ 2 , Xiitij)) = 0. There was, however, an error term of variance a 2 2 . Thus, 
the intermediate variable for MMSE was: 

0% = AiiUj) + ffEdi + PjPiSLiiUj) + $Edi x Pr&iiUj) + 4- ( 5 ) 

MMSE takes values between and 30, so we have M 2 = 31. It is judicious 
to use a model for the family of cut-off points (? m = T 2 (m, r/) which is more 
parsimonious than considering all the cut-off values as parameters. We have 
c m 2 = +°° an d c o = ~°° an d f° r satisfying condition C3 we fixed (? M x 
arbitrarily at the value c 2 M2 _ l = 40. There is no reason that the MMSE scale 
be linear with respect to the latent process scale so we used the following 
model yielding unequally spaced cut-off points: (? m = 40 — r\\ (M 2 — 1 —m) m . 
We limited this power model tome [1, M 2 — 3] and we gave an independent 
parameter r/3 for c 2 M2 _ 2 , which made it possible to improve the fit as compared 
to extending the above model up to M 2 — 2. Thus our model for the test 
function for MMSE involves three parameters: 771, 7/2 and 773. 
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4.3 The likelihood for the application 

We computed the likelihood according to section 2. We also had to include 
the selection condition mentioned in section 4.1: since only non-demented 
subjects were included, the likelihood is conditional on {6^ > 770} (the event 
that subject % is not diagnosed demented at initial visit tn); the conditional 
likelihood for subject i is Ci/ P{6} 1 > 770)- We obtain from the model: d\ ~ 
M ( f((3,Xi(tn)) , Sj(l, 1) ), so that we have: 



The likelihood was maximized using the RVS algorithm described in sec- 
tion 3.3. 

4.4 A Simulation 

In order to demonstrate the ability of our algorithm to maximize such a 
complex likelihood we tried it on a simulated data set. We generated a sample 
of size n = 2131 with the same age distribution at the initial visit and the 
same proportion of educated and non-educated subjects as in the real data 
sample from the PAQUID study. We generated 4 visits as in the real data set, 
the initial visit and visits after one, three and five years. The values of the 16 
parameters were taken equal to the values estimated in the real data set. We 
took as starting values: /3 2 = /?3 = /3 4 = = /?f = /?f = 0; fi 1 = 38.5; (3 5 = 1; 
i] = 30; 771 = r] 2 = 1; i] 3 = 39; a ai = 10~ 5 ; a^i = a £ 2 = 10. The algorithm 
converged in 19 iterations. The results are given in Table 1. We see that the 
estimated values are reasonably close to the target values and that the .95 
confidence intervals include these values. The algorithm converged toward 
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the same point from different starting values. We also verified the quality of 
the inverse hessian for giving estimates of the variances of the estimators of 
the parameters by checking a reasonable agreement between some Wald tests 
and likelihood ratio tests. On the whole, the algorithm seems to be reliable. 

4.5 Model estimated from the PAQUID data 

The values of the parameters estimated from the PAQUID sample are shown 
in Table 2. As expected there is a significant mean trend of decrease of 
global cognitive ability (see j3s) with a shape not far from a quadratic form 
(see (3 5 ). There is a significant heterogeneity around the intercept (see <J ai ). 
The significant random effect for dementia (cr d \) means that some subjects 
are not diagnosed demented at repeated visits in spite of low cognitive ability. 

The value of 0.58 for parameter r/ 2 indicates that a difference of one 
point of MM SE corresponds to a larger difference in cognitive ability for high 
cognitive level than for a low one; in other words, the sensitivity of MMSE 
is better for low level than for high level; this is graphically illustrated in 
Figure 2 which displays a grid of the cut-off values making it visible that a 
larger difference in latent process (or rather intermediate variable) values is 
necessary to make one point of difference for the MMSE for higher rather 
than for lower level. This is reminiscent of the mixed linear model applied 
by Jacqmin-Gadda et al. (1997) to the square-root of 30 minus MMSE (in 
fact the number of errors). 

In order to assess the degree of realism of our model for the MMSE we 
computed the expected numbers of subjects having score m at the MMSE 
at TO: this was achieved by computing for each subject the probability of 
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having score m and summing the 2131 probabilities. The computation of 
these probabilities was carried out with the estimated model, taking into 
account the ages, educational levels and the practice effect, as well as the 
different variability terms and the use of formulae similar to that used for 
the prediction in section 4.6. Figure 1 compares the histograms of observed 
MMSE scores with the histogram of expected numbers; it can be seen that 
they are quite similar. There is a slight discrepancy at scores 22 and 21: 
this artefact is due to the screening design for diagnosing dementia in the 
PAQUID study at TO which used the threshold 24 and which probably led 
interviewers to put 22 or 21 rather than 24 for some subjects (to trigger the 
visit of a neurologist). 

We can make an approximate link between the threshold for dementia 
i]o and values at the MMSE. Taking zero values for the random effect for 
dementia and errors for the MMSE, the value of the threshold approximately 
corresponds to MMSE= 19 and MMSE= 21 for low and high educational 
levels respectively. (The value 19 is found as follows. For a subject with 
low educational level we have from (6): = Aj(tjj) and E(0*-) = Aj(ijj); 
thus if we consider subjects for which E(#*-) = r/ they have = r) ; the 
corresponding value m of the MMSE score satisfies the equation r/ = 40 — 
77! (30 -mo)" 2 ). 

Our model allows us to distinguish the effect of educational level on the 
latent cognitive ability on the one hand and on the MMSE score on the other. 
Educational level has a significant effect (fa) on the intercept of the cognitive 
ability process, but not on the slope (fa); there is a highly significant effect 
of educational level (f3f) for the MMSE. To sum up, (because of the positive 
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/?f) subjects with high educational level tend to have higher MMSE than 
subjects with low educational level, for the same value of the latent process 
(true cognitive ability), leading to a diagnosis of the former as demented 
at higher MMSE levels than the latter; on the other hand (because of the 
positive p 2 ) subjects with high educational level tend to have higher value 
of the latent process than subjects with low educational level, leading to a 
lower rate of diagnosis of dementia for the former as compared to the latter. 
Finally, there is a significant effect of practice effect (/3|) (subjects have a 
lower MMSE at the first visit than what would be expected); the interaction 
of practice with educational level (/5|) is not significant. 

Several features of these results can be best illustrated by a graphic. 
Figure 2 displays, in the latent process scale, both the grid of the cut-off 
values for the MMSE (horizontal dotted lines) and the threshold for diagnosis 
of dementia (the horizontal crosses line at i]q = 24.41). It also displays the 
expected value of the latent process of cognitive ability for subjects with low 
and high educational level (the curve for low educational level starts at the 
value of the intercept f3\ = 32.90). The curves are approximately parallel and 
the curve for low educational level below; this explains that a larger incidence 
of dementia has been observed in this group (Letenneur et al., 1999). We can 
see that the decline of this expected value is very slow near the age of 65 and 
accelerates for older ages for both low and high educational levels. This is 
rather in agreement with normative values which have been established in the 
United States (Crum et al., 1993) and in France (Lechevallier-Michel et al., 
2004) although the results can not be compared directly: one main difference 
is that normative values exclude demented subjects; another difference is that 
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we model the practice effect. Figure 2 also shows the dispersion for the values 
of the latent process by showing a region in which 95% of the values for low 
educated subjects lie at each age. The lowest bound curve (dashed line) 
crosses the threshold value (around 75) and so, it is graphically apparent 
that a growing number will be diagnosed demented with older age. 

Moreover Figure 2 illustrates the effect of educational level on values of 
the MMSE (for a given value of the latent process), as well as the prac- 
tice effect on MMSE scores. It displays the expected values of intermediate 
variables for MMSE (0?) for subjects with low and high educational levels 
entering at 75 in the study and seen one, three, five and eight years after. 
In our model these expected values are equal to the expected value of the 
latent process for subjects with low educational level (the stars) except for 
the first visit where the value is lower due to the practice effect: this is be- 
cause if Edj = and Pra^ = we have from formula ([5]) 0?- = A^t^-) +efj, so 
that E(0f-) = E[Aj(ty)]. As already mentioned, there is a grid indicating the 
values of the MMSE obtained as a function of the intermediate variable. For 
instance a subject with low educational level who has her intermediate vari- 
ables equal to the expectations and entering at 75 at TO would have MMSE 
values 24, 25, 25, 24 and 23 at TO, Tl, T3, T5 and T8 respectively. The 
expectations of the intermediate variables for subjects with high educational 
level are higher than the expected value of the latent process for the same 
time. The results illustrated in this figure, contribute to the debate regard- 
ing the possible correction of the MMSE to take the educational level into 
account and regarding the effect of educational level on dementia. It appears 
that educational level has an effect on global cognitive ability (our latent 
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process), and thus on dementia, but also has a specific effect on MMSE. 
4.6 Prediction of dementia diagnosis 

The model may be used for predicting diagnosis of dementia for subject i at 
time ti,„ l+ i, given the MMSE values at the successive visits (1, . . . ,nj) and 
given that subject i has not been diagnosed demented up to visit n«. The 
information that we have up to visit rii is summarized by the event 0j G Cj. 
The probability that subject i is diagnosed demented at U tni+ i is 

This expression is not affected by the condition of not being diagnosed de- 
mented up to visit rii as the corrective conditional probability cancels out in 
the ratio. In order to compute the numerator we need the joint distribution 
of Q} ni +\ and 6j. This is a normal distribution with expectation: 
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and variance matrix E* formed by the block Sj augmented by the correlation 
between 0] n . +l and 0j and the variance of 0} n . +1 . These are given by: 

cov (^,n l+ i> = v 2 ai + Uj + °li for j = 1, . . . , + 1; 
cov(^ n . +1 , 6l) = a 2 ai + Uj, for j = 1, . . . , m. 

We selected subjects that had not been diagnosed demented up to visit 
T5 and who had been seen at T8: iV = 1187 subjects satisfied these criteria. 
We computed their individual probabilities Pi of being diagnosed demented 
at visit T8, using the values of the parameters 9 estimated from the follow- 
up up to five years. One aim was to predict the number N d of subjects 
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diagnosed demented at T8: a natural predictor is the expectation of N d 
(conditional on information up to T5) which is J2iLiPi- We found N d = 
46.6. A predictive interval can be computed using the fact that vsxN d = 
12f=iPi(^ ~ Pi) an d treating Nd as approximately normal; we found that the 
95% predictive interval was [34.1; 59.2]. We observed 56 new diagnoses at 
T8, a number inside the predictive interval. 

Another way to assess the predictive ability of our model for diagnosis 
of dementia at T8 was to consider the piS as quantitative values on which 
a classification as positive or negative could be made according to a cut-off 
value, as in the theory of diagnostic tests. Sensitivity and specificity can be 
computed for each cut-off value and the ROC curve relates sensitivities and 
specificities for the different cut-off values. Figure 3 gives the ROC curve for 
our prediction of dementia diagnosis. In particular, the area under the ROC 
curve is a summary measure of performance of the test. The area under the 
ROC curve of our model is 0.82, a rather good value. 

5 Conclusion 

We have developed a general model for multivariate longitudinal ordinal data. 
It could be easily extended to include continuous data: we could use for test 
k a continuous function h^{.) : = hk(9^). Such a test function could be 
chosen in a family of functions depending on a parameter r/ k . For instance 
Proust et al. (2006) in an analogous problem have chosen the family of beta 
cumulative distribution functions indexed by two parameters. 

When modeling cerebral ageing one would also have to model death: joint 
modeling of dementia and death has been achieved by the use of an illness- 
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death model (Joly et al., 2002; Commenges et al., 2004) but cognitive ability 
was not modeled. It is not possible to rigorously treat the joint occurrence 
of diagnosis of dementia, psychometric tests and death with existing mod- 
els. However, approximate inference can be made by considering death as 
censoring, as has been done in this paper. 

Our model is useful for jointly modeling psychometric tests and diagnosis 
of dementia but could be applied to other epidemiological contexts. 
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Table 1: 



A simulation 



mimicking the PAQUID study example. 



Parameters 


Targets 


Estimates 


St. Dev. 


Pi 


32.90 


32.51 


0.36 


Pi 


2.34 


3.09 


0.46 


Ps 


-0.022 


-0.017 


0.006 


Pa 


0.0013 


0.02 


0.13 


Ps 


1.84 


1.91 


0.10 


PI 


1.69 


1.41 


0.35 


Pi 


-1.65 


-1.53 


0.15 


Pi 


0.29 


0.25 


0.17 


Vo 


24.41 


24.38 


0.60 




3.93 


3.94 


0.16 


m 


0.58 


0.58 


0.01 


V3 


36.64 


36.52 


0.15 




2.04 


2.10 


0.21 


a D i 


2.68 


2.49 


0.18 


a £ 2 


2.55 


2.59 


0.11 
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Table 2: Results from the analysis of the five-year follow-up of the PAQUID 
study 



Parameters 


Estimates 


St. Dev. 


ft: 


intercept for A 


32.90 


0.41 


ft: 


effect of education on intercept 


2.34 


0.55 


ft: 


slope of A 


-0.022 


0.008 


ft: 


effect of education on slope 


0.0013 


0.0018 


ft: 


power of t 


1.84 


0.11 


ft 2 : 


effect of education on MMSE 


1.69 


0.45 


ft 2 : 


practice effect for MMSE 


-1.65 


0.17 


ft 2 : 


interaction education x practice effect 


0.29 


0.20 




threshold for dementia 


24.41 


0.65 


77! : 


multiplicative factor for the cut-off model of MMSE 


3.93 


0.19 


V2- 


power for the cut-off model of MMSE 


0.58 


0.006 


m- 


value of c 2 g 


36.64 


0.17 


cr ai 


variance of the random effect for intercept 


2.04 


0.21 


a D i 


: variance of the random effect for dementia 


2.68 


0.20 


a £ 2 


variance of error in the intermediate equations for MMSE 


2.55 


0.13 
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Figure 1: Histogram of the MMSE score at the initial visit. Black: observed 
histogram; grey: expected numbers. 
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Figure 2: Mean evolution of the latent process based on the follow-up of 
five years in the PAQUID study for low (dashed line) and high (plain line) 
educational level; the band (delimited by the dashed lines) shows a region 
where 95% of the values for low educated subjects lie; horizontal line with 
crosses is the threshold value for dementia; expected intermediate variables 
for subjects of low (stars) and high (open circles) educational level entering at 
75 years in the study and seen at TO, Tl, T3, T5 and T8; the grid shows the 
values of the MMSE obtained for specific values of the intermediate variable. 
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Figure 3: ROC curve showing the ability of the model to predict dementia 
at the eight-year visit based on the follow-up of five years in the PAQUID 
study 
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